36 research outputs found
Recommended from our members
A Log Domain Pulse Model for Parametric Speech Synthesis
Most of the degradation in current Statistical Parametric Speech Synthesis (SPSS) results from the form of the vocoder. One of the main causes of degradation is the reconstruction of the noise. In this article, a new signal model is proposed that leads to a simple synthesizer, without the need for ad-hoc tuning of model parameters. The model is not based on the traditional additive linear source-filter model, it adopts a combination of speech components that are additive in the log domain. Also, the same representation for voiced and unvoiced segments is used, rather than relying on binary voicing decisions. This avoids voicing error discontinuities that can occur in many current vocoders. A simple binary mask is used to denote the presence of noise in the time-frequency domain, which is less sensitive to classification errors. Four experiments have been carried out to evaluate this new model. The first experiment examines the noise reconstruction issue. Three listening tests have also been carried out that demonstrate the advantages of this model: comparison with the STRAIGHT vocoder; the direct prediction of the binary noise mask by using a mixed output configuration; and partial improvements of creakiness using a mask correction mechanism.European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie; 10.13039/501100000266-EPSR
The case of Digital Writing in Instant Messaging: When cyber written productions are closer to the oral code than the written code.
International audienceThe use of New Information and Communication Technologies, or NICTs, has deeply changed the traditional reading and writing practices. It thus seems necessary to provide a definition of Digital Writing in Instant Messaging (DWIM) to better understand its grammatical, lexical and syntactic characteristics (these two last components define the traditional characteristics of both oral and written codes). Thirty-two French-speaking students around the age of 13 who were enrolled in 8th grade produced one hour of DWIM productions on an instant messaging website in groups of two. They were able to use as many cyber languages as they wanted (we referred the expression digital writing). This corpus helped to understand that this written structure is closer to the oral code than the written code (the studied population developed their language skills in constant contact with the written in its dual form). Indeed, we showed for instance that users of DWIM sometimes produced repetitions (whereas it is forbidden in traditional writing), never use subject-verb inversions in interrogative sentences, can replace punctuation with emoticons, or used undefined deixises in their sentences. We have also been able to show that having traditional reading and writing habits is not sufficient to create a predisposition towards the use of the DWIM code
Segmentation d'images multispectrales par arbre de Markov caché flou
Nous définissons un nouvel outil de segmentation statistique non supervisée, basé sur un modèle d'arbre de Markov caché flou. Notre modèle flou combine l'incertitude probabiliste des données observées avec les classes thématiques discrètes et continues qui représentent l'imprécision des données cachées. La technique de segmentation bayésienne mise en oeuvre correspond au critère MPM (Mode of Posterior Marginals). Notre approche permet d'une part le traitement d'objets contenant des structures diffuses comme c'est le cas en imagerie astronomique et d'autre part la prise en compte de données multi-bandes observées à différents niveaux de résolution et issues de capteurs corrélés. Nous validons notre modèle sur des images de synthèse et des images réelles multispectrales
Making Sense of Variations: Introducing Alternatives in Speech Synthesis
International audienceThis paper addresses the use of speech alternatives to enrich speech synthesis systems. Speech alternatives denote the variety of strategies that a speaker can use to pronounce a sentence - depending on pragmatic constraints, speaking style, and specific strategies of the speaker. During the training, symbolic and acoustic characteristics of a unit-selection speech synthesis system are statistically modelled with context-dependent parametric models (GMMs/HMMs). During the synthesis, symbolic and acoustic alternatives are exploited using a GENERALIZED VITERBI ALGORITHM (GVA) to determine the sequence of speech units used for the synthesis. Objective and subjective evaluations supports evidence that the use of speech alternatives significantly improves speech synthesis over conventional speech synthesis systems
Automatic Phoneme Segmentation with Relaxed Textual Constraints
cote interne IRCAM: Lanchantin08aNational audienceVery high quality text-to-speech synthesis can be achieved by unit selection in a large recorded speech corpus [1]. This technique uses some optimal choice of speech units (e.g. phones) in the corpus and concatenates them to produce speech output. For various reasons, synthesis sometimes has to be done from existing recordings (rushes) and possibly without a text transcription. But, when possible, the text of the corpus and the speaker are carefully chosen for best phonetic and contextual covering, for good voice quality and pronunciation, and the speaker is recorded in excellent conditions. Good phonetic coverage requires at least 5 hours of speech. Accurate segmentation of the phonetic units in such a large recording is a crucial step for speech synthesis quality. While this can be automated to some extent, it will generally require costly manual correction. This paper presents the development of such an HMM-based phoneme segmentation system designed for corpus construction
Dynamic Model Selection for Spectral Voice Conversion
cote interne IRCAM: Lanchantin10bNone / NoneNational audienceDynamic Model Selection for Spectral Voice Conversio
OBJECTIVE EVALUATION OF THE DYNAMIC MODEL SELECTION METHOD FOR SPECTRAL VOICE CONVERSION
ABSTRACT Spectral voice conversion is usually performed using a single model selected in order to represent a tradeoff between goodness of fit and complexity. Recently, we proposed a new method for spectral voice conversion, called Dynamic Model Selection (DMS), in which we assumed that the model topology may change over time, depending on the source acoustic features. In this method a set of models with increasing complexity is considered during the conversion of a source speech signal into a target speech signal. During the conversion, the best model is dynamically selected among the models in the set, according to the acoustical features of each source frame. In this paper, we present an objective evaluation demonstrating that this new method improves the conversion by reducing the transformation error compared to methods based on a single model
Objective Evaluation of the Dynamic Model Selection for Spectral Voice Conversion
cote interne IRCAM: Lanchantin11bNone / NoneNational audienceObjective Evaluation of the Dynamic Model Selection for Spectral Voice Conversio